Learn to human-level control in dynamic environment using incremental batch interrupting temporal abstraction

نویسندگان

  • Yuchen Fu
  • Zhipeng Xu
  • Fei Zhu
  • Quan Liu
  • Xiaoke Zhou
چکیده

The temporal world is characterized by dynamic and variance. A lot of machine learning algorithms are difficult to be applied to practical control applications directly, while hierarchical reinforcement learning can be used to deal with them. Meanwhile, it is a commonplace to have some partial solutions available, called options, which are learned from knowledge or predefined by the system, to solve sub-tasks of the problem. The option can be reused for policy determination in control. Many traditional semi-Markov decision process methods take advantage of it. But most of them treat the option as a primitive object. However, due to the uncertainty and variability of the environment, they are unable to deal with real world control problems effectively. Based on the idea of interrupting option under the prerequisite for dynamic environment, a Q-learning control method which uses temporal abstraction, named as I-QOption, is introduced. The I-QOption approach combines the idea of interruption with the characteristics of dynamic environment so as to be able to learn and improve control policy in dynamic environment. The Q-learning framework helps to learn from interaction with raw data and achieving human-level control. The I-QOption algorithm is applied to grid world, a benchmark dynamic environment evaluation testing. The experiment results show that the proposed algorithm can learn and improve policy effectively in dynamic environment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Seismic Reliability Analysis of Offshore Fixed Platforms Using Incremental Dynamic Analysis

It is generally accepted that performance-based design has to be reliability-based. Seismic performance evaluation is based on nonlinear dynamics and reliability theory taking into account uncertainties during analysis. Considering the economic importance of jacket type offshore platforms, the present research aims to assess the seismic performance of offshore steel platforms. In this study, th...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Self-Organizing Perceptual and Temporal Abstraction for Robot Reinforcement Learning

A major current challenge in reinforcement learning research is to extend methods that work well on discrete, short-range, low-dimensional problems to continuous, highdiameter, high-dimensional problems, such as robot navigation using high-resolution sensors. We present a method whereby an robot in a continuous world can, with little prior knowledge of its sensorimotor system, environment, and ...

متن کامل

Seismic Design of Steel Structures Based on Ductility and Incremental Nonlinear Dynamic Analysis

In this paper a simple tool for seismic design of steel structures for a selected ductility level is presented. For this purpose, a consistent set of earthquakes is selected and sorted based on the maximum acceleration of ground surface. The selected records are applied as the base motion to a single-degree-of-freedom system with strain hardening and the maximum response acceleration is determi...

متن کامل

Batch-Incremental versus Instance-Incremental Learning in Dynamic and Evolving Data

Many real world problems involve the challenging context of data streams, where classifiers must be incremental: able to learn from a theoretically-infinite stream of examples using limited time and memory, while being able to predict at any point. Two approaches dominate the literature: batch-incremental methods that gather examples in batches to train models; and instance-incremental methods ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Comput. Sci. Inf. Syst.

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2016